Fix the catalog, making use of rc3 fixes #92

beckjake · 2020-05-26T20:35:13Z

resolves #85
resolves #89
resolves #90

~~Of course, 0.17.0rc3 doesn't exist yet, but it will soon, and it will fix these problems!~~ 0.17.3 exists and this points to it now.
If it was resolved by dbt-labs/dbt-core#2489, I marked it as resolved by this PR - peeling the consequences of the two apart sounds annoying.

This PR takes the opinionated stance that you aren't allowed to use database in spark, it's always None. There is probably room to go further (maybe spark__generate_database_name should raise if your custom_database_name is non-None), but I think this is functional.

remove _parse_relation - it fully duplicates other code now.
make use of dbt's DEFAULT_TYPE_TESTER for creating the agate table. This prevents an issue with 1 row becoming True rows in the statistics output.
Override the core generate_database_name macro to return None.
statistics and table owners now work, hooray.

jtcohen6 · 2020-05-27T00:01:04Z

I took this for a spin, and I'm still seeing the two errors as before:

If a source or snapshot doesn't have a database/target_database defined that's equal to source_name/schema/target_schema. In the example below, dbt_default is the target.schema configured inprofiles.yml, and dbt_snapshot is configured from within the snapshot block.

Error while parsing relation test_ts:
      identifier: test_ts
      schema: dbt_snapshot
      database: dbt_default
  On Spark, database should not be set. Use the schema config to set a custom schema/database for this relation.

After manually setting the database/target_database param, I see the following when trying to generate docs:

Encountered an error while generating catalog: Compilation Error
  Expected only one schema in spark _get_one_catalog
dbt encountered 1 failure while writing the catalog

I think we still need a way to:

Set database = None for all nodes/relations, including sources and snapshots
Tell dbt to _get_one_catalog per unique schema instead of per unique database

jtcohen6 · 2020-05-29T18:30:30Z

I checked out the latest code in this branch, and I'm still seeing similar issues as above.

Sources + snapshots

The first error case now returns Cannot set database in spark! instead of On Spark, database should not be set.... To replicate:

Add a source with a name (or schema) that is different from target.schema (as configured in the profile), and missing a database param
Or, as above but a snapshot with target_schema that is different from target.schema (from profile), and missing target_database
Run dbt compile

The way to resolve the error currently is by setting the database config to the same value as the schema. This double-setting shouldn't be necessary, and it's counterintuitive because we don't want users to use the database configs at all.

Catalog generation

The second error case is still Expected only one schema in spark _get_one_catalog. To replicate:

Add a model with a custom schema configured.
Run dbt docs generate

Somehow, dbt needs to know to run _get_one_catalog once per schema instead of once per database (which should now always be None). It looks like the run-once-per-database behavior is implicit within the information_schema argument... is there a way around this?

Next steps?

Ideally, the scenarios above would be included in our integration testing suite, which we know is a ways away from where it needs to be. If it would be helpful in continuing work on this branch, I can try adding to the dbt-integration-tests spark branch:

a source
a model with a custom schema

I think the highest-priority next step is figuring out if these are fixes we can make within the plugin, or if there is a lingering core issue in the way.

Fix catalog generation to properly handle 1 database (None!) with many schemas Update unit tests

beckjake · 2020-05-29T19:46:48Z

I've added fixes to the issues you identified.

It looks like the run-once-per-database behavior is implicit within the information_schema argument... is there a way around this?

This happens in the adapter portion of the code, so yup!

jtcohen6

This works like a dream! Thank you for taking it the last mile

Jacob Beck added 2 commits May 26, 2020 14:27

Fix the catalog, making use of rc3 fixes

3216cd4

Bump dbt version: 0.17.0rc2 → 0.17.0rc3

78f2ae1

beckjake force-pushed the fix/0.17.0-catalog-fixes branch from c8c1e32 to 78f2ae1 Compare May 26, 2020 20:53

beckjake requested a review from jtcohen6 May 26, 2020 20:55

Stop monkeying around with database

6eb854f

beckjake force-pushed the fix/0.17.0-catalog-fixes branch from 5b0a09a to 6eb854f Compare May 27, 2020 14:07

Default the database to None instead of schema in credentials

76c23db

Fix catalog generation to properly handle 1 database (None!) with many schemas Update unit tests

beckjake force-pushed the fix/0.17.0-catalog-fixes branch from 27e2b11 to 76c23db Compare May 29, 2020 19:46

jtcohen6 approved these changes May 29, 2020

View reviewed changes

jtcohen6 merged commit 354046d into master May 29, 2020

jtcohen6 deleted the fix/0.17.0-catalog-fixes branch May 29, 2020 21:29

This was referenced May 29, 2020

Always set database = schema #91

Closed

Exclude null database from relation details dbt-labs/dbt-docs#96

Merged

jtcohen6 mentioned this pull request Feb 16, 2022

Can't generate dbt docs databricks/dbt-databricks#52

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix the catalog, making use of rc3 fixes #92

Fix the catalog, making use of rc3 fixes #92

beckjake commented May 26, 2020 •

edited

Loading

jtcohen6 commented May 27, 2020

jtcohen6 commented May 29, 2020

beckjake commented May 29, 2020

jtcohen6 left a comment

Fix the catalog, making use of rc3 fixes #92

Fix the catalog, making use of rc3 fixes #92

Conversation

beckjake commented May 26, 2020 • edited Loading

jtcohen6 commented May 27, 2020

jtcohen6 commented May 29, 2020

Sources + snapshots

Catalog generation

Next steps?

beckjake commented May 29, 2020

jtcohen6 left a comment

Choose a reason for hiding this comment

beckjake commented May 26, 2020 •

edited

Loading